The Foundations of Cost-Sensitive Learning
نویسنده
چکیده
This paper revisits the problem of optimal learning and decision-making when different misclassification errors incur different penalties. We characterize precisely but intuitively when a cost matrix is reasonable, and we show how to avoid the mistake of defining a cost matrix that is economically incoherent. For the two-class case, we prove a theorem that shows how to change the proportion of negative examples in a training set in order to make optimal cost-sensitive classification decisions using a classifier learned by a standard non-costsensitive learning method. However, we then argue that changing the balance of negative and positive training examples has little effect on the classifiers produced by standard Bayesian and decision tree learning methods. Accordingly, the recommended way of applying one of these methods in a domain with differing misclassification costs is to learn a classifier from the training set as given, and then to compute optimal decisions explicitly using the probability estimates given by the classifier. 1 Making decisions based on a cost matrix Given a specification of costs for correct and incorrect predictions, an example should be predicted to have the class that leads to the lowest expected cost, where the expectation is computed using the conditional probability of each class given the example. Mathematically, let the (i; j) entry in a cost matrix C be the cost of predicting class i when the true class is j. If i = j then the prediction is correct, while if i 6= j the prediction is incorrect. The optimal prediction for an example x is the class i that minimizes L(x; i) =Xj P (jjx)C(i; j): (1) Costs are not necessarily monetary. A cost can also be a waste of time, or the severity of an illness, for example. For each i, L(x; i) is a sum over the alternative possibilities for the true class of x. In this framework, the role of a learning algorithm is to produce a classifier that for any example x can estimate the probability P (jjx) of each class j being the true class of x. For an example x, making the prediction i means acting as if i is the true class of x. The essence of cost-sensitive decision-making is that it can be optimal to act as if one class is true even when some other class is more probable. For example, it can be rational not to approve a large credit card transaction even if the transaction is most likely legitimate. 1.1 Cost matrix properties A cost matrix C always has the following structure when there are only two classes: actual negative actual positive predict negative C(0; 0) = 00 C(0; 1) = 01 predict positive C(1; 0) = 10 C(1; 1) = 11 Recent papers have followed the convention that cost matrix rows correspond to alternative predicted classes, while columns correspond to actual classes, i.e. row/column = i/j = predicted/actual. In our notation, the cost of a false positive is 10 while the cost of a false negative is 01. Conceptually, the cost of labeling an example incorrectly should always be greater than the cost of labeling it correctly. Mathematically, it should always be the case that 10 > 00 and 01 > 11. We call these conditions the “reasonableness” conditions. Suppose that the first reasonableness condition is violated, so 00 10 but still 01 > 11. In this case the optimal policy is to label all examples positive. Similarly, if 10 > 00 but 11 01 then it is optimal to label all examples negative. We leave the case where both reasonableness conditions are violated for the reader to analyze. Margineantu [2000] has pointed out that for some cost matrices, some class labels are never predicted by the optimal policy as given by Equation (1). We can state a simple, intuitive criterion for when this happens. Say that row m dominates row n in a cost matrixC if for all j, C(m; j) C(n; j). In this case the cost of predicting n is no greater than the cost of predicting m, regardless of what the true class j is. So it is optimal never to predict m. As a special case, the optimal prediction is always n if row n is dominated by all other rows in a cost matrix. The two reasonableness conditions for a two-class cost matrix imply that neither row in the matrix dominates the other. Given a cost matrix, the decisions that are optimal are unchanged if each entry in the matrix is multiplied by a positive constant. This scaling corresponds to changing the unit of account for costs. Similarly, the decisions that are optimal are unchanged if a constant is added to each entry in the matrix. This shifting corresponds to changing the baseline away from which costs are measured. By scaling and shifting entries, any two-class cost matrix that satisfies the reasonableness conditions can be transformed into a simpler matrix that always leads to the same decisions:
منابع مشابه
A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate
Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملCredit Card Fraud Detection using Data mining and Statistical Methods
Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملA particle swarm optimization algorithm for minimization analysis of cost-sensitive attack graphs
To prevent an exploit, the security analyst must implement a suitable countermeasure. In this paper, we consider cost-sensitive attack graphs (CAGs) for network vulnerability analysis. In these attack graphs, a weight is assigned to each countermeasure to represent the cost of its implementation. There may be multiple countermeasures with different weights for preventing a single exploit. Also,...
متن کاملA 3D Discrete Element Analysis of Failure Mechanism of Shallow Foundations in Rocks
In this research work, a 3D numerical modeling technique is proposed based on the 3D particle flow code in order to investigate the failure mechanism of rock foundations. Two series of footings with different geometries and areas are considered in this work. The failure mechanism obtained is similar to that of the Terzaghi’s but there is a negligible difference in between. Lastly, one equation ...
متن کامل